Genome-­wide scan of pulmonary phenotypes on local ancestry ⟶ genes interacting with smoking

Andrey Ziyatdinov, PostDoctoral Fellow at HSPH

September 27, 2017
Statistical Genetics Meeting
Channing Division of Network Medicine

Outline

  • Background and project’s goals
  • Interaction model on local ancestry
  • Association results
  • Post-association analysis

Gene-environment interaction scans

↑ Sample Size ⟶ ↑ Power for GWAS

But this strategy is generally not successful for interaction GWAS

  • Example: Genome-wide scan of FEV1/FEV1-FVC on 50,008 individuals from UK Biobank (Wain et al. 2015) reveals 0 SNP-smoking interactions (p < 5e-08)

One of the alternative strategies is to aggregate genetic variants:

  1. Grouping into Genetic Risk Scores (GRS) (Aschard et al. 2017)
  2. Using the ancestry information (Aschard et al. 2015), (Park et al. 2016)

Genome of recently admixed individuals

Leaveraging ancestry information

COPDGene African Americans dataset

  • 3.3K African Americans from the COPDGene project
  • 7 quantitative & 1 binary outcomes
    • FEV1/FEV1pp, FVC/FVCpp, FEV1_FVC, pctEmph_Slicer, TLCpp, finalGold
  • binary exposures
    • SmokCigNow, CigPerDayNow > 15, CompletedSchool > 2

37K long (>10kb) local ancestry segments (Parker et al. 2014)

Project goals

  1. Prove: local ancestry → gene-environment interactions
  2. Follow-up the ancestry-based findings
    • Fine-mapping on SNPs
    • Enrichment analysis
    • Check Gene Expression and Methylation

The COPDGene dataset is appropriate, as the proportion of African ancestry is associated with the risk of COPD (Kumar et al. 2010)

Outline

  • Background and project’s goals
  • Interaction model on local ancestry
  • Association results
  • Post-association analysis

First association model

  • marginal effect: \(y \sim a_g + a_l\)
  • interaction effect: \(y \sim a_g + a_l + x_e + a_g * x_e + a_l * x_e\)

Confounding factors other than global ancestry \(a_g\):

  • trait-sepcific covariates, e.g.
    • FEV1 ~ Age + Age^2 + Gender + Height + PackYears + SmokCigNow
  • random effect on medical centers for all traits (\(\approx\) 3-5% of variance)
  • random effect of medical device for pctEmph_Slicer (\(\approx\) 18% of variance)

This (marginal) model was used in (Parker et al. 2014).
Is it OK for interaction?

First QQ plots: marginal & interaction

Ancestry Relatedness Matrix (ARM) & Heritability

trait h2_acs
FEV1_FVC_utah 0.0057
FEV1pp_utah 0.0105
FEV1_utah 0.0104
FVCpp_utah 0.0078
FVC_utah 0.0079
log_pctEmph_Slicer 0.0076
TLCpp_race_adjusted 0.0131

\(h^2_{acs} = 2 F_{STC} \theta (1 - \theta) h^2\) (Zaitlen et al. 2014), where

  • \(F_{STC}\) measures frequency differences between populations
  • \(\theta\) is the genome-wide ancestry proportion, i.e. global ancestry \(a_g\)

Example: on simulated data on Figure 1 (Zaitlen et al. 2014) \(h^2_{acs}\) 0.032 (s.e. 0.007) corresponds to \(h^2\) 0.83 (s.e. 0.18) (simulated parameters \(F_{STC} = 0.08\) and \(\theta = 0.5\))


\(h^2\) for FEV1/FVC in the UK Biobank sample of 84K: ~80% (Ge et al. 2017)

Covariance Matrices & Population stratification

Linear mixed-effects model (LMM): \(y \sim X \beta + g + g_{int} + e\) (Sul et al. 2016)

Fixed effects: \(X = [\dots; a_g; a_{l_i}; x_e; a_g * x_e; a_{l_i} * x_e]\)

  • association test performed on \(a_{l_i} * x_e\) predictor

Random effects:

  • \(g \sim \mathcal{N}(0, \sigma^2_g ARM)\)
  • \(g_{int} \sim \mathcal{N}(0, \sigma^2_{g_{int}} EARM)\)
  • \(e \sim \mathcal{N}(0, \sigma^2_e I)\)

Heteroskedasticity due to SmokCigNow

Unbalanced groups by SmokCigNow / CigPerDaySmokNow:

CigPerDaySmokNow = 0 1-14: >=15
657 (20%) 1,261 (38%) 1382 (42%)

The group CigPerDaySmokNow = 0 has large variance, as all are former smokers

  • Weighted Least Squares (WLS) is not applicable, as relative weights are unknown
  • Linear mixed-effects model (LMM) solves the issue with an additional random effect

Our approach to fix model misspecification

  1. Ancestry Relatedness Matrix (ARM) (Zaitlen et al. 2014)
  2. Another (EARM) for ancestry-exposure component (Sul et al. 2016)
  3. Modeling heteroskedasticity (Don’t depreciate exploratory plots!)
  4. Selection of smoking covariates
    • SmokCigNow + ATS_PackYearsDuration_Smoking + log_CigPerDaySmokAvg + SmokCigNow + SmokCigNow0_15 + SmokCigarNow

More details in our previous talk COPDGene African-Americans & QQ plots

Results: Clean QQ plots (marginal)

Results: Nearly Clean QQ plots (interaction)

Outline

  • Background and project’s goals
  • Interaction model on local ancestry
  • Association results
  • Post-association analysis

Ancestry-SmokCigNow (7 traits)

Zoom in Chr 11 (2 repetetive traits)

Multi-trait test (\(p\) traits)

\(z = [z_1; z_2; \dots]^T \sim N(0, \Sigma)\)
under the null hypothesis

  • Estimate covariance pairs
    • truncated normal
    • threshold 2.5
  • Apply a test
Test Stat Law
Omnibus \(z^T \Sigma^{-1} z\) \(\chi^2(p)\)
sumZ \((1^T z)^2 / 1^T \Sigma^{-1} 1\) \(\chi^2(1)\)

(Aschard et al. In prep.)
(Province et al, 2013)

Results: Omnibus test (5 traits)

Results: Top Genes

Bonferroni 0.05 / 37K = 1.4e-06


Ancestry segment: 11:12,332,105 - 12,394,102
Genes within \(\pm\) 100kb: PARVA, MICAL2, MICALCL, RASSF10, TEAD1
Trait Exposure z-score p-value
FEV1pp SmokCigNow 4.5 7.3e-06
Omnibus SmokCigNow 5.7e-05
FEV1_FVC SmokCigNow 3.9 1.1e-04
FEV1 SmokCigNow 3.8 1.4e-04

Ancestry segment: 2:238,819,792 - 238,904,351
Genes within \(\pm\) 100kb: TWIST2, HDAC4, MIR4440, MIR4441

Trait Exposure z-score p-value
FEV1 SmokCigNow0_15 4.2 2.6e-05
FEV1pp SmokCigNow0_15 4.1 4.9e-05
FVC SmokCigNow0_15 4.0 6.9e-05
Omnibus SmokCigNow_15 1.5e-04
FVCpp SmokCigNow0_15 3.7 2.5e-04

Outline

  • Background and project’s goals
  • Interaction model on local ancestry
  • Association results
  • Post-association analysis (ongoing work)

The effective number of tests

Predictors are correlated

  • Bonferroni is conservative
  • Permutation/Resampling takes time
  • The effective number of tests (Davis et al. 2016)
    • Covariance matrix among predictors
    • Eigen-value decompoistion
    • How many eigen-vectors are enough to explain 99% of variance

However, the threshold depends on the genetic architecture, e.g. heritability (Joo etl al. 2016)

How to perform enrichment analysis?

  1. Gene-set enrichment analysis (GSEA)
    • As simple as the Enrichr web tool
  2. Functional or tissue-specific enrichment (Finucane et al. 2015)
    • SNP-level resolution is required


Data Min size Mean size Genome coverage
Local ancestry 10kb 13kp 74%
ENCODE annotation 0.150kb 10%
Intersection 10kb 11kb 70%

What type of interactions do we detect?

  • We observe that our top associated genes are associated with epigenetic changes, e.g. (Wan et al. 2015)
  • Admixed interaction mapping was already applied to gene expression and methylation data (Park et al. 2016)

Hypothesis on the mechanism of interaction:
Smoking → Up/Down Methylation → COPD-related phenotype

Conclusions

Genome-wide interaction scan on local ancestry has potential to discover new gene-environment interactions
(the number of tests is decreased dramatically)

  • solved: population stratification, heteroscedasticity
  • todo: multiple testing correction, interpretation

In the future, we plan to gain insights on the interaction mechanism (collaboration work)

  • Gene Expression data in COPDGene
  • Methylation data in COPDGene

Thank you

References

Aschard et al. 2015. “Leveraging local ancestry to detect gene-gene interactions in genome-wide data.” BMC Genetics 16 (1). BMC Genetics: 124. doi:10.1186/s12863-015-0283-z.

———. 2017. “Evidence for large-scale gene-by-smoking interaction effects on pulmonary function.” International Journal of Epidemiology 46 (3): 894–904. doi:10.1093/ije/dyw318.

Davis et al. 2016. “An Efficient Multiple-Testing Adjustment for eQTL Studies That Accounts for Linkage Disequilibrium Between Variants” 98 (1). Elsevier: 216–24.

Finucane et al. 2015. “Partitioning Heritability by Functional Annotation Using Genome-Wide Association Summary Statistics.” Nature Genetics 47 (11). NIH Public Access: 1228.

Ge et al. 2017. “Phenome-Wide Heritability Analysis of the Uk Biobank.” PLoS Genetics 13 (4). Public Library of Science: e1006711.

Joo etl al. 2016. “Multiple Testing Correction in Linear Mixed Models.” Genome Biology 17 (1). BioMed Central: 62.

Kumar et al. 2010. “Genetic Ancestry in Lung-Function Predictions.” New England Journal of Medicine 363 (4): 321–30. doi:10.1056/NEJMoa0907897.

Park et al. 2016. “An Ancestry Based Approach for Detecting Interactions.”

Parker et al. 2014. “Admixture mapping identifies a quantitative trait locus associated with FEV1/FVC in the COPDGene Study.” Genetic Epidemiology 38 (7): 652–59. doi:10.1002/gepi.21847.

Renier et al. 2017. “HHS Public Access” 165 (7): 1789–1802. doi:10.1016/j.cell.2016.05.007.Mapping.

Sul et al. 2016. “Accounting for Population Structure in Gene-by-Environment Interactions in Genome-Wide Association Studies Using Mixed Models.” PLoS Genetics 12 (3): e1005849. doi:10.1371/journal.pgen.1005849.

Wain et al. 2015. “Novel insights into the genetics of smoking behaviour, lung function, and chronic obstructive pulmonary disease (UK BiLEVE): A genetic association study in UK Biobank.” The Lancet Respiratory Medicine 3 (10): 769–81. doi:10.1016/S2213-2600(15)00283-0.

Wan et al. 2015. “Smoking-associated site-specific differential methylation in buccal mucosa in the COPDGene study.” American Journal of Respiratory Cell and Molecular Biology 53 (2): 246–54. doi:10.1165/rcmb.2014-0103OC.

Zaitlen et al. 2014. “Leveraging population admixture to characterize the heritability of complex traits.” Nature Genetics 46 (12). Nature Publishing Group: 1356–62. doi:10.1038/ng.3139.